Case Study of Scientific Data Processing on a Cloud Using Hadoop

نویسندگان

  • Chen Zhang
  • Hans De Sterck
  • Ashraf Aboulnaga
  • Haig Djambazian
  • Robert Sladek
چکیده

With the increasing popularity of cloud computing, Hadoop has become a widely used open source cloud computing framework for large scale data processing. However, few efforts have been made to demonstrate the applicability of Hadoop to various real-world application scenarios in fields other than server side computations such as web indexing, etc. In this paper, we use the Hadoop cloud computing framework to develop a user application that allows processing of scientific data on clouds. A simple extension to Hadoop’s MapReduce is described which allows it to handle scientific data processing problems with arbitrary input formats and explicit control over how the input is split. This approach is used to develop a Hadoop-based cloud computing application that processes sequences of microscope images of live cells, and we test its performance. It is discussed how the approach can be generalized to more complicated scientific data processing problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...

متن کامل

Big Data Platform of a System Recommendation in Cloud Environment

Cloud Computing is one of the emerging technologies. This research paper aimed to outline cloud computing and its features, and considered cloud computing for machine learning and data mining. The goal of the paper was to develop a recommendation and search system using big data platform on cloud environment. The main focus was on the study and understanding of Hadoop, one of the new technologi...

متن کامل

myHadoop - Hadoop-on-Demand on Traditional HPC Resources

Traditional High Performance Computing (HPC) resources, such as those available on the TeraGrid, support batch job submissions using Distributed Resource Management Systems (DRMS) like TORQUE or the Sun Grid Engine (SGE). For large-scale data intensive computing, programming paradigms such as MapReduce are becoming popular. A growing number of codes in scientific domains such as Bioinformatics ...

متن کامل

A Survey On Distributed Video Management Cloud Platform Using Hadoop

This paper presents the literature review on distributed video management cloud platform using Hadoop. Due to complexities of big video data management, such as immense processing of large amount of video data to do a video summary, it is challenging to effectively and efficiently store and process these video data in a user friendly way. Based on the parallel processing and flexible storage ca...

متن کامل

Towards a next generation of scientific computing in the Cloud

More than ever, designing new types of highly scalable data intensive computing is needed to qualify the new generation of scientific computing and analytics effectively perform complex tasks on massive amounts of data such as clustering, matrix computation, data mining, information extraction ... etc. MapReduce, put forward by Google, is a well-known model for programming commodity computer cl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009